Fig. 3.23. An illustration of the introduction of the momentum term.

der to improve the learning quality of a MLP model, the above

pdate rule has been altered by introducing the momentum term

art, et al., 1986], which is based on the update of the model

rs in the previous learning cycle. The momentum term can

y prevent over-update of model parameters. The use of the

m term is shown below, where 0 ൏ߙ൏1 is called the

m factor,

Δܟ௧ାଵൌെߟ׏ߝ

׏ܟ൅ߙΔܟ

(3.36)

above equation, Δܟ stands for the update of w at the learning

nd Δܟ௧ାଵ stands for the update of w at the learning cycle t + 1. It

een that the update term ߟ׏ߝ׏ܟ

and the momentum term

ve different signs, hence different directions. Whenever the term

ܟ goes too far, the term ߙΔܟ will pull the move backward

Therefore the momentum term can reduce the oscillation

y and prevent the potential move from a wrong direction so as to

saddle point on the error function curve. As shown in Figure 3.23,

move from ݔ to ݔ, which will make the move to the saddle point

other advanced approach for improving the learning capability is

f the second order derivative, such as the Hessian matrix [Bishop,